Intro

The browserly module is designed to easly create track view visualizations of positional genomic data such as ChIP-seq, ATAC-seq, or DNase hypersesitivey assays. High-level functions are provided to easly make visualizations that the user can use to compare results between samples, conditions, and even genomic loci.

Plot single locus

Input Data

A trackview requires three components to display properly.

  1. The view range: a GRanges object describing the region to be displayed
  2. Annotation data: a list containing annotation features that will be rendered
  3. Coverage data: a GRanges object that contains the coverage to be displayed

View Range

Browserly needs to know the range to plot. This is used to pull the correct annotation information. The function get_view_range makes the appropriate object for use with browserly. If passed an OrganismDbi object, get_view_range can also be used to get the range by gene symbol. Here we’ll use the region for the gene GREB1, because it is a good responder to ER and PR.

view_range <- get_view_range(chr = 2, start=11482341, end=11642788)
view_range
## GRanges object with 1 range and 0 metadata columns:
##       seqnames               ranges strand
##          <Rle>            <IRanges>  <Rle>
##   [1]        2 [11482341, 11642788]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

The range can also be extended if desired. This is a particularly useful function when creating uniformly sized views around disparate regions. For example, the TSS of many genes can be made as 1bp GRanges and extended to be of equal length.

The range can be extended symmetrically on both sides

extend_grange(view_range, 50000)
## GRanges object with 1 range and 0 metadata columns:
##       seqnames               ranges strand
##          <Rle>            <IRanges>  <Rle>
##   [1]        2 [11432341, 11692788]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

or asymmetrically by passing a vector

extend_grange(view_range, c(1000, 5000))
## GRanges object with 1 range and 0 metadata columns:
##       seqnames               ranges strand
##          <Rle>            <IRanges>  <Rle>
##   [1]        2 [11481341, 11647788]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Annotation data

We pre-load the annotation information from the TxDB into an easily searched list. This saves time when repeated calls are made to find the genomic ranges and features that are used in plotting coverage information. The variable tx_data contains several lists that a indexed by transcript feature (e.g. intron, utr5). It also keeps track of the seqlevelstyle used in the database.

txdb <- TxDb.Hsapiens.BioMart.igis
tx_data <- load_tx_data(txdb)
print(names(tx_data))
## [1] "intron"         "utr5"           "utr3"           "cds"           
## [5] "exon"           "transcripts"    "seqlevelsStyle"

Note: We also use the annotation helper function match_tx_to_gene to zip together OrgDB and TxDB objects by entrez id. This allows genes to be searched by gene symbol, which is often more human friendly.

symbol_table <- match_tx_to_gene(txdb, 'org.Hs.eg')
print(head(symbol_table))
## # A tibble: 6 × 6
##      entrez    chr     start       end strand  symbol
##       <chr> <fctr>     <int>     <int> <fctr>   <chr>
## 1         1     19  58346806  58353499      -    A1BG
## 2        10      8  18391245  18401213      +    NAT2
## 3       100     20  44619519  44651735      -     ADA
## 4      1000     18  27950966  28177481      -    CDH2
## 5     10000      1 243488233 243851079      -    AKT3
## 6 100008586      X  49570366  49577757      + GAGE12F

Coverage information.

Users are expected to provide their own data for plotting. Data should be provided as a GRanges object, where the metadata columns (mcols) contain the coverage values in each range.

The utility function make_coverage_tracks is provided for convenience, and has a few helpful features. It takes a list of coverage files (e.g. bigwig, bam) and the view_range to be displayed. Optionally, the user can provide corresponding names for the samples (e.g. treatment condition, antibody), and a scaling factor for the coverage data (e.g. library size).

# Get the coverage
cvg <- make_coverage_tracks(fi[idx, "File.bw"], 
                            target_range = view_range, 
                            sample_names = shortened_names, 
                            scaling_factors = sf)
head(cvg)
## GRanges object with 6 ranges and 5 metadata columns:
##       seqnames               ranges strand |            ER_input
##          <Rle>            <IRanges>  <Rle> |           <numeric>
##   [1]        2 [11482341, 11482439]      * |  0.0141685398538994
##   [2]        2 [11482440, 11482539]      * | 0.00255033717370189
##   [3]        2 [11482540, 11482639]      * |                   0
##   [4]        2 [11482640, 11482739]      * |                   0
##   [5]        2 [11482740, 11482839]      * |                   0
##   [6]        2 [11482840, 11482939]      * |                   0
##                    ER_ab    ER_ab_Pr_treat              PR_ab
##                <numeric>         <numeric>          <numeric>
##   [1]  0.136749081577534 0.117184084720934 0.0917524797025164
##   [2] 0.0107314675603833 0.120886707902033 0.0917524797025164
##   [3] 0.0825497504644868 0.261271271917296 0.0458762398512582
##   [4]  0.161797510910394 0.389957122264621  0.170659612246681
##   [5]  0.141985570798917 0.366559694928744  0.183504959405033
##   [6] 0.0734692779133933 0.190104097104003 0.0954225788906171
##           PR_ab_Pr_treat
##                <numeric>
##   [1]                  0
##   [2] 0.0515054944045998
##   [3]  0.174670807111252
##   [4]  0.223936932193912
##   [5]  0.186987338381917
##   [6]  0.063822025675265
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Standard plot

A common representation of coverage is a trackview. Browserly makes it simple to create an interactive trackview that looks similar to other packages, such as Gviz.

plot_single_locus(view_range, tx_data, cvg, stacking='squish', type="scatter")

By default, browserly densely stacks all overlapping transcripts to save visual space. Future versions may allow for the display of “canonical” transcripts. All y axes default to being on the same scale, but this can also be disabled.

plot_single_locus(view_range, tx_data, cvg, type = "scatter", sync_y = FALSE)

Heatmap of coverage

Browserly automatcially determines the type of plot for showing coverage. When many samples are displayed, browserly defaults to using a heatmap. This saves vertical space, without losing information.

plot_single_locus(view_range, tx_data, cvg)

SNP information

SNPs can be added from external sources, provided as GRanges. Here we use SNP information from the GWAS catalog.

snps <- readRDS("/gne/research/workspace/finklej/ExternalData/SNP/gwas_catalog_v1.0.1-associations_e84_r2016-07-10.rds")
snps_in_range <- get_snps_in_range(snp_gr = snps, target_range = view_range)
plot_single_locus(view_range, tx_data, cvg, type='scatter', snps = snps_in_range,
                  stacking = 'squish', sync_y = FALSE)

Additional plotly arguments

All plotly arguments that can be passed to a trace can also be passed through plot_single_locus. The outcome of these arguments is not guaranteed. Please see the plotly documentation for more information.

Mesoscale view

Browserly also provides functionality for displaying multiple genomic loci. This may be useful when the user wants to compare coverage for several genes under many different conditions. For example, prior knowledge may indicate that a set of genes shows signifcant differential expression, which may be attributable to changes in transcription factor binding between treatments.

Data preparation

The user specifies the genes of interest, as well as the coverage files for the samples that will be displayed.

genes
## [1] "RNF2"  "E2F1"  "GREB1" "BAP1"
cvg_files
## [1] "/gne/research/workspace/rinaldj1/projects/bulkChipPipeline/data/CARROLL_PR/ngs_pipeline/MCF7_Input_Full_Media_3hr/results/MCF7_Input_Full_Media_3hr.coverage.bw"
## [2] "/gne/research/workspace/rinaldj1/projects/bulkChipPipeline/data/CARROLL_PR/ngs_pipeline/SRR2000673/results/SRR2000673.coverage.bw"                              
## [3] "/gne/research/workspace/rinaldj1/projects/bulkChipPipeline/data/CARROLL_PR/ngs_pipeline/SRR2000679/results/SRR2000679.coverage.bw"                              
## [4] "/gne/research/workspace/rinaldj1/projects/bulkChipPipeline/data/CARROLL_PR/ngs_pipeline/SRR2000712/results/SRR2000712.coverage.bw"                              
## [5] "/gne/research/workspace/rinaldj1/projects/bulkChipPipeline/data/CARROLL_PR/ngs_pipeline/SRR2000718/results/SRR2000718.coverage.bw"

The browserly plotting function requires two additional arguments:

  1. A GRangesList of coverage data. Each item in the list will be on a separate subplot
  2. A GRangesList of the annotation data corresponding to the coverage region in item 1. These will be displayed on separate subplots, paired with the appropriate coverage.

Here we prepare the gene regions to be compared by centering around each gene’s TSS. This provides a uniform comparison between genes.

centered_genes <- lapply(genes, function(gene){
  gene_data <- get_centered_gene_info(gene = gene,
                                      extension = extension,
                                      cvg_files = cvg_files,
                                      sample_names = conditions,
                                      scaling_factor = sf,
                                      tx_data = tx_data,
                                      symbol_table = symbol_table)
  return(gene_data)
})

cvg_list <- GRangesList(lapply(centered_genes, function(x) {x$gene_cvg}))
names(cvg_list) <- genes
tx_list <- GRangesList(lapply(centered_genes, function(x) {x$gene_tx}))
names(tx_list) <- genes

Multi gene plot

Next we plot the gene coverage for each gene under the specified conditions. By default, the subplots are not on the same scaled. This can be changed using the same sync_y argument as in the single locus view.

plot_multiple_genes(genes = genes,
                    centered_cvg = cvg_list,
                    centered_tx = tx_list)

Adding expression data

Gene expression can be added to the plots if a named matrix is passed. The rows correspond to the genes to be plotted, and the columns are labelled by treatment condition. Columns with the same names are grouped together in box plots.

print(gene_exprs)
##              E2  E2_R5020 E2_Progesterone E2_Progesterone E2_Progesterone
## RNF2  38.516690 39.405321       37.935830       41.097939       40.706197
## E2F1   4.580853  5.716005        6.380168        5.259367        5.562977
## GREB1 25.249426 25.976837       24.504076       24.370023       23.257813
## BAP1  36.467995 53.828737       52.426210       54.923232       50.771586
##              E2        E2  E2_R5020 E2_Progesterone        E2  E2_R5020
## RNF2  36.019152 38.266073 39.506667       38.127430 34.949106 36.921805
## E2F1   5.437536  5.717051  6.265557        5.713678  5.475839  5.441846
## GREB1 24.742712 23.968580 23.353371       24.486605 23.970045 23.683170
## BAP1  42.890593 33.353263 51.972375       55.249781 36.410661 55.047490
##       E2_Progesterone  E2_R5020  E2_R5020 E2_Progesterone        E2
## RNF2        41.442032 38.909271 39.110774       34.501752 33.957258
## E2F1         5.233502  5.333585  4.715712        5.890774  6.124918
## GREB1       23.369010 23.596555 24.698469       22.812247 23.213565
## BAP1        51.558513 52.911366 49.867967       54.869362 40.483741
##              E2  E2_R5020       E2  E2_R5020        E2 E2_Progesterone
## RNF2  36.303870 37.772537 33.61251 40.195715 35.309970       42.492230
## E2F1   5.745819  5.742541  5.43056  4.692601  4.936273        4.783116
## GREB1 26.236504 25.145120 24.59093 24.936866 27.027741       26.392113
## BAP1  38.033621 60.464678 41.32940 56.584159 44.174546       49.464618
##        E2_R5020 E2_Progesterone
## RNF2  49.457395       36.838772
## E2F1   4.878229        6.387816
## GREB1 24.192221       23.500812
## BAP1  47.611069       50.668117
plot_multiple_genes(genes = genes,
                    centered_cvg = cvg_list,
                    centered_tx = tx_list,
                    gene_exprs = gene_exprs)